Improving spam mail filtering using classification algorithms with discretization Filter

نویسندگان

  • Supriya S. Shinde
  • Rahul Patil
چکیده

1 ME Computer Student, 2 Professor 1,2 Computer Department, Pimpri Chinchwad College of Engineering, Nigdi, Pune, Maharashtra, INDIA _________________________________________________________________________ Abstract: Email spam or junk e-mail is one of the major problems of the today's usage of Internet, which carries financial damage to organizations and annoying individual users. Among the approaches developed till date to stop spam, filtering is an important and popular technique. Common practices for mail filters include organizing incoming email and removal of spam mails and viruses. A rare use is to examine outgoing email at some companies to ensure that employees observe with appropriate rules. Users might also employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or other criteria. For many years these elements have driven pattern recognition and machine learning communities to keep improving email filtering techniques. Mail filters can be connected by the user, either as separate packages, or as part of their email program as email client. In case of emails, users can make manual filters which automatically filter mail according to the criteria chosen by particular user. In this paper, we present a survey of the performance of six machine learning methods in spam filtering techniques. Experiments are carried out on different classification techniques and association techniques using Waikato Environment for Knowledge Analysis (WEKA). Different classifiers are applied on one benchmark dataset in to evaluate which classifier gives better result. The dataset is in Attribute Relation File Format (ARFF). 10 fold cross validation is used to provide well accuracy. Results of classification algorithms are compared on spambase UCI dataset and it is found that no single algorithm performs best for spam mail filtering. For the different dataset it is observed that performance varies with different data sets. Our results prove that the performance of classification improve with filters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Email classification for Spam Detection using Word Stemming

Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on t...

متن کامل

Email classification for Spam Detection using Word Stemming

Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on t...

متن کامل

Email classification for Spam Detection using Word Stemming

Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on t...

متن کامل

A Trust Based System for Enhanced Spam Filtering

The effectiveness of current anti-spam systems is limited by the ability of spammers to adapt to filtering techniques and the lack of incentive for mail servers to filter outgoing spam. A new approach, based on decentralised trust management, is described in this paper. An architecture and protocol, called TOPAS (Trust Overlay Protocol for Anti Spam), are presented. Each mail server records tru...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014